scTree usage example

This example follows the palantir tutorial notebook. Doing palantir diffusion maps is usually a good pre-preprocessing step before using elpigraph or ppt.

Importing modules and basic settings

In [1]:
from anndata import AnnData
import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
from matplotlib import rcParams
import scanpy as sc
import palantir

import scTree as sct
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
In [2]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(6, 6), facecolor='white')
scanpy==1.5.1 anndata==0.7.3 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.8.2 louvain==0.7.0 leidenalg==0.8.0

Run Palantir to obtain diffusion maps

In [3]:
counts = palantir.io.from_csv('https://github.com/dpeerlab/Palantir/raw/master/data/marrow_sample_scseq_counts.csv.gz')
norm_df = palantir.preprocess.normalize_counts(counts)
norm_df = palantir.preprocess.log_transform(norm_df)
pca_projections, _ = palantir.utils.run_pca(norm_df)
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=5)
ms_data = palantir.utils.determine_multiscale_space(dm_res)
tsne = palantir.utils.run_tsne(ms_data,n_jobs=20)
Determing nearest neighbor graph...
computing neighbors
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:02)
/home/lfaure/miniconda3/envs/gpu/lib/python3.7/site-packages/palantir/utils.py:43: FutureWarning: This location for 'distances' is deprecated. It has been moved to .obsp[distances], and will not be accesible here in a future version of anndata.
  kNN = temp.uns["neighbors"]["distances"]

Create anndata object and import normalised count and palantir results

In [4]:
adata=AnnData(X=norm_df.values)
adata.obs_names=norm_df.index
adata.var_names=norm_df.columns
adata.obsm["X_palantir"]=ms_data.values
adata.obsm["X_dTSNE"]=tsne.values

Build tree using ElPiGraph algorithm

In [5]:
sct.tl.tree(adata,method="epg",Nodes=50,use_rep="palantir",
            device="gpu",seed=1,epg_lambda = 0.02)
inferring a principal tree --> parameters used 
    50 principal points, mu = 0.1, lambda = 0.02
Generating the initial configuration
Creating a chain in the 1st PC with 2 nodes
90% of the points have been used as initial conditions. Resetting.
Constructing tree 1 of 1 / Subset 1 of 1
The elastic matrix is being used. Edge configuration will be ignored
Computing EPG with  50  nodes on  4142  points and  3  dimensions
Nodes =  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 

BARCODE	ENERGY	NNODES	NEDGES	NRIBS	NSTARS	NRAYS	NRAYS2	MSE	MSEP	FVE	FVEP	UE	UR	URN	URN2	URSD

2||50	0.0119	50	49	44	2	0	0	0.0028	0.0022	0.9972	0.9978	0.009	0.0001	0.0071	0.3539	0


9.735  seconds elapsed
    finished (0:01:50) --> added 
    'epg', dictionnary containing inferred elastic tree generated from elpigraph (adata.uns)
    'tree/B', adjacency matrix of the principal points (adata.uns)
    'tree/R', soft assignment (sigma=0.0001) of cells to principal point in representation space (adata.uns)
    'tree/F', coordinates of principal points in representation space (adata.uns)

Projecting the tree on the tsne computed from Palantir

In [6]:
sct.pl.tree(adata,basis="dTSNE")
findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans.
In [7]:
sc.pl.scatter(adata,color="CD34",basis="dTSNE",color_map="viridis")

Selecting a root on the tree and projecting cells to obtain pseudotime value

In [8]:
sct.tl.root(adata,37)
root selected --> added
    'tree/root', selected root (adata.uns)
    'tree/pp_info', for each PP, its distance vs root and segment assignment (adata.uns)
    'tree/pp_seg', segments network information (adata.uns)
In [9]:
sct.tl.pseudotime(adata)
projecting cells onto the principal tree
    finished (0:01:30) --> added
    'edge', assigned edge (adata.obs)
    't', pseudotime value (adata.obs)
    'seg', segment of the tree where the cell is assigned to (adata.obs)
    'tree/df_list', list of cell projection from all mappings (adata.uns)
    'tree/img_list', list of cell/pp graph from all mappings (adata.uns)
In [11]:
sc.pl.scatter(adata,basis="dTSNE", color=['t','seg'], legend_loc='on data',color_map="viridis")

Test for associated features associated with the tree

In [12]:
sct.tl.test_association(adata,n_jobs=20)
test features for association with the tree
    mapping 0: 100%|██████████| 16106/16106 [05:36<00:00, 47.84it/s]
    found 3438 significant features (0:06:48) --> added
    'p_val' values from statistical test (adata.var)
    'fdr' corrected values from multiple testing (adata.var)
    'st' proportion of mapping in which feature is significant (adata.var)
    'A' amplitue of change of tested feature (adata.var)
In [13]:
sct.pl.test_association(adata)

Fit associated features

In [14]:
sct.tl.fit(adata,n_jobs=20)
fit features associated with the tree
    mapping 0:   1%|          | 40/3438 [00:01<02:06, 26.82it/s]
/home/lfaure/miniconda3/envs/gpu/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py:706: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
    mapping 0: 100%|██████████| 3438/3438 [04:59<00:00, 11.49it/s]
    finished (0:05:10) --> added
    'tree/fit_list', list of fitted features on the tree for all mappings (adata.uns)
    'tree/fit_summary', summary of all fitted features on the tree for all mappings (adata.uns)

Cluster and plot fitted features

In [15]:
sct.tl.cluster(adata,knn=100)
Finding 100 nearest neighbors using minkowski metric and 'auto' algorithm
Neighbors computed in 7.285085916519165 seconds
Jaccard graph constructed in 2.8362033367156982 seconds
Wrote graph to binary file in 1.2869925498962402 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.720795
After 3 runs, maximum modularity is Q = 0.725139
Louvain completed 23 runs in 4.416988372802734 seconds
PhenoGraph complete in 16.01350736618042 seconds
    finished (0:00:17) --> added
    'tree/fit_clusters', cluster assignments for features (adata.uns)
In [16]:
pd.Series(adata.uns["tree"]["fit_clusters"]).unique()
Out[16]:
array([8, 6, 2, 0, 7, 4, 1, 5, 3])
In [17]:
import matplotlib.pyplot as plt
for c in pd.Series(adata.uns["tree"]["fit_clusters"]).unique():
    sct.pl.cluster(adata,clu=c,basis="dTSNE",figsize=(10,6))
findfont: Font family ['Bitstream Vera Sans'] not found. Falling back to DejaVu Sans.